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We analyze the statistical properties of nonparametric regression 
estimators using covariates which are not directly observable, but 
have be estimated from data in a preliminary step. These so-called 
generated covariates appear in numerous applications, including two- 
stage nonparametric regression, estimation of simultaneous equation 
models or censored regression models. Yet so far there seems to be 
no general theory for their impact on the final estimator's statistical 
properties. Our paper provides such results. We derive a stochas- 
tic expansion that characterizes the influence of the generation step 
on the final estimator, and use it to derive rates of consistency and 
asymptotic distributions accounting for the presence of generated co- 
variates. 

1. Introduction. A wide range of statistical applications requires non- 
parametric estimation of a regression function when some of the covari- 
ates are not directly observed, but have themselves only been estimated in 
a (possibly nonparametric) preliminary step. Examples include triangular 
simultaneous equation models [e.g., Newey, Powell and Vella (1999), Blun- 
dell and Powell (2004), Imbens and Newey (2009)], sample selection models 
[Das, Newey and Vella (2003)], treatment effect models [Heckman, Ichimura 
and Todd (1998), Heckman and Vytlacil (2005)], censored regression mod- 
els [Lewbel and Linton (2002)], generalized Roy models [d'Haultfoeuille and 
Maurel (2009)], stochastic volatility models [Kanaya and Kristensen (2009)] 
and GARCH-in-Mean models [Conrad and Mammen (2009)], amongst many 
others. In contrast to fully parametric settings [Pagan (1984)], there seems 
to be no general theoretical results on how to derive the statistical properties 
of such nonparametric two-step estimators. Instead, most available results 
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in the literature typically exploit peculiarities of a specific model, and can 
thus not easily be transferred to other applications. 

In this paper, we study the statistical properties of a nonparametric es- 
timator rhiL of a conditional mean function mo(x) = E(y|ro(«S') = x) when 
the function tq is unknown, but can be estimated from data. While we 
are specific about estimating rriQ by local linear regression [Fan and Gijbels 
(1996)] to simplify technical arguments, we neither require the generated re- 
gressors R = r{S) to emerge from a specific type of model, nor do we require 
a specific procedure to estimate them. We only impose high-level conditions 
on the accuracy and complexity of the first step estimate. In particular, our 
main result holds irrespectively of whether the function rg is, for example, 
a density, a conditional mean function or a quantile regression function, 
or whether it is estimated by kernel methods, orthogonal series or sieves. 
Moreover, our results are not confined to nonpar ametrically generated co- 
variates, but also apply in settings where tq is estimated using parametric 
or semiparametric restrictions. 

Our main result uses techniques from empirical process theory to show 
that the presence of generated covariates affects the first-order asymptotic 
properties of itill only through a smoothed version of the estimation er- 
ror f(s) — r'o(s). This additional smoothing typically improves the rate of 
convergence of the estimator's stochastic part, reducing the "curse of di- 
mensionality" from estimating rg to a secondary concern in this context. It 
does not, however, affect the order of magnitude of the deterministic compo- 
nent. Still, the estimator rhiL can have a faster overall rate of convergence 
than the first step estimator r if the latter has a sufficiently small bias. 

We extensively illustrate the implications of our main result for the im- 
portant special case that tq is the conditional mean function in an auxil- 
iary nonparametric regression. For this setting, we derive simple and explicit 
stochastic expansions that can not only be used to establish asymptotic nor- 
mality or the rate of consistency of the estimated regression function itself, 
but also study the properties of more complex estimators, in which estima- 
tion of a regression function merely constitutes an intermediate step, such 
as structured nonparametric models imposing additive separability [Stone 
(1985)]. Our results thus cover a wide range of models, and should therefore 
be of general interest. We use our techniques to study two such examples in 
greater detail: nonparametric estimation of a simultaneous equation model 
and nonparametric estimation of a censored regression model. 

To the best of our knowledge, there are only few papers on nonpara- 
metric regression with estimated covariates not tailored to a specific appli- 
cation. Andrews (1995) derives some results for generated covariates con- 
verging at a parametric rate. Sperlich (2009) uses restrictive assumptions 
which lead to asymptotic results that are different from the ones obtained 
in the present paper. Song (2008) considers series estimation of the func- 
tional g{x, r) = K(Y\r{X) = x) indexed hy x £ X cM and r G A, where A is 
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a function space with finite integral bracketing entropy, and derives a rate 
of consistency uniformly over (x,r) G A' x A; see also Einmahl and Mason 
(2000) for a related problem. 

Our paper is also related to a recent literature on semiparametric estima- 
tion problems with generated covariates. Li and Wooldridge (2002) consider 
a partial linear model with generated covariates. Hahn and Ridder (2011) 
use pathwise derivatives to derive the influence function of semiparametric 
linear GMM-type estimators. Escanciano, Jacho-Chavez and Lewbel (2011) 
provide stochastic expansions for sample means of weighted semiparametric 
regression residuals with potentially generated regressors, and study their 
application to certain index models. Compared to the nonparametric prob- 
lems studied in this paper, semiparametric applications typically exhibit 
several additional technical issues. In particular, different techniques are 
needed to control the magnitude of certain remainder terms. Addressing 
these issues would require substantial refinements our results, which are not 
needed for the class of nonparametric problems we are focusing on. To keep 
the present paper more readable, we study semiparametric estimators with 
generated covariates separately in Mammen, Rothe and Schienle (2011). 

The outline of this paper is as follows. In the next section, we describe 
our setup in detail. Section 3 gives some motivating examples. Section 4 
establishes the asymptotic theory and states the main results. In Section 5, 
we apply our results to some of the examples given in Section 3, thus illus- 
trating their application in practice. Finally, Section 6 concludes. All proofs 
are collected in the Appendix. 

2. Nonparametric regression with generated covariates. The nonpara- 
metric regression model with generated regressors can be written as 

(2.1) y = mo(ro(5))+e with E(e|ro(S)) = 0, 

where Y is the dependent variable, S is a p-dimensional vector of covari- 
ates, rriQ : M*^ — 7- M and rg : — t- M*^ are unknown functions and e is an error 
term that has mean zero conditional on the true value of covariates to co- 
variates vq^S)} We assume that there is additional information available 
outside of the basic model (2.1) such that the function rg is identified. For 
example, tq could be (some known transformation of) the mean function in 
an auxiliary nonparametric regression, which might involve another random 
vector, say T, in addition to Y and S. 

Our aim is to estimate the function 7710(2;) = E(y|ro(«S') = x). Since tq is 
unobserved, obtaining a direct estimator based on a nonparametric regres- 
sion of y on R = rQ{S) is clearly not feasible. We therefore consider the 



^Note that in contrast to an earlier working paper version of this paper, we do no longer 
assume that the "index" ro(5) is a sufficient statistic for the covariates S, which would 
imply that E(y|ro(S)) = E(y 
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following two-stage procedure. In the first stage, an estimate r of ro is ob- 
tained. We do not require a specific estimator for this step. Instead, we only 
impose the high-level restrictions that the estimator f is uniformly consis- 
tent, converging at a rate specified below, and takes on values in a function 
class that is not too complex. Depending on the nature of the function tq, 
these kind of regularity conditions are typically satisfied by various common 
nonparametric estimators, such as kernel-based procedures or series estima- 
tors, under suitable smoothness restrictions. In the second step, we then 
obtain our estimate of rriQ through a nonparametric regression of Y 

on the generated covariates R = r{S), using local linear smoothing. That is, 
our estimator is given by rhLL{x) = a obtained from 

n 

{a, (5) = argmin ^^(yi -a- 0^ {Ri - x)fKh{Ri - x), 

where Kh{u) = Y['j=i^i'^j/^j)/^j ^ d-dimensional product kernel with 
univariate kernel function /C, and h = (hi, . . . , h^) is a vector of bandwidths 
that tend to zero as the sample size n increases to infinity. 

For the later asymptotic analysis, it will also be useful to compare rhiL 
to an infeasible estimator rfiLL that uses the true function ro instead of an 
estimate r. Such an estimator can be obtained by local linear smoothing 
of Y versus R = ro(<S'), that is, it is given by rhLL{x) = d, where 

n 

(a,^) = argminy'(yi -a- jf {Ri - x)f'Kh{Ri - x). 

In order to distinguish these two estimators, we refer to in the following 
as the real estimator, and to friLL as the oracle estimator. 

Our use of local linear estimators in this paper is based on the following 
considerations. First, in a classical setting with fully observed covariates, 
estimators based on local linear regression are known to have attractive 
properties with regard to boundary bias and design adaptivity [see Fan and 
Gijbels (1996) for an extensive discussion], and they allow a complete asymp- 
totic description of their distributional properties. In the present setting 
with generated covariates, these properties simplify the asymptotic treat- 
ment. The design adaptivity leads to a discussion of bias terms that does 
not require regular densities for the randomly perturbed covariates, and the 
complete asymptotic theory allows a clear description of how the final es- 
timator is affected by the estimation of the covariates. On the other hand, 
our assumptions on the estimation of the covariates are rather general and 
can be verified for a broad class of smoothing methods, including sieves and 
orthogonal series estimators. 

3. Motivating examples. There are many statistical applications which 
involve nonparametric estimation of a regression function using nonparamet- 
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rically generated covariates. In this section, we give an overview of some of 
the most popular examples and explain how they fit into our framework. In 
Section 4, we revisit the first three of these examples, studying their asymp- 
totic properties in detail. A thorough treatment of the remaining examples 
involves several additional technical issues beyond dealing with the presence 
of estimated covariates, such as boundary problems, and is thus omitted 
for brevity. See also Mammen, Rothe and Schienle (2011) for an extensive 
discussion of semiparametric problems with generated covariates. 

3.1. The generic example: Nonparametric two-stage regression. In many 
applications, the unknown function ro is a conditional expectation function 
from an auxiliary nonparametric regression. As a first motivating example, 
we therefore consider a "two-stage" nonparametric regression model given 

by 

Y = mo{ro{S)) + e, 
r = ro(5) + C, 

where C is an unobserved error term that satisfies £^[("15"] = E[e\rQ{S)] = 0. 
As the structure of this example is particularly simple, it is used extensively 
in Section 4 below to illustrate the application of our main result. Proceeding 
like this is instructive, as the types of technical difficulties encountered in 
this example are representative for those in a wide range of other statistical 
applications. 

3.2. Nonparametric censored regression. Consider a nonparametric re- 
gression model with fixed censoring, that is, 

(3.1) y = max(0,^o(^) - f^), 

where U is an unobserved mean zero error term that is assumed to be 
independent of the covariates X. Fixed censoring is a common phenomenon 
in many applications, for example, the analysis of wage data. Note that the 
censoring threshold could be different from zero, as long as it is known. 
Lewbel and Linton (2002) establish identification of the function /ig under 
the tail condition lim„^_oo uFij{u) = on the distribution function Fjj of U . 
In particular, they show that the function can be written as 

(3.2) ^0(2;) = Ao-/ —rrdr, 

Jro(x) Qo{r) 

where ro(x) = E(y|X = x), qo{r) = E{I{Y > 0}|ro(A') = r), and Aq is some 
suitably chosen constant. An estimate of the function fiQ can then be ob- 
tained from a sample analog of (3.2), that is, through numerical integration 
of a nonparametric estimate of the function qQ{r)~^. Nonparametric estima- 
tion of qq involves nonparametrically generated regressors, and thus fits into 
our framework with {¥, S) = {I{Y > 0},X) and ro{S) = ro{X). 
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3.3. Nonparametric triangular simultaneous equation models. Covariates 
that are correlated with disturbance terms appear in many economic models 
and are denoted as endogenous. When, for example, analyzing the relation- 
ship between wages and schooling, unobserved individual characteristics like 
ability or motivation might affect both the outcome and the explanatory 
variable. A common approach is to model these quantities jointly, achieving 
identification by using so-called instrumental variables, that are independent 
of unobservables, affect the endogenous variable, but exert no direct influ- 
ence on the outcome. Consider, for example, the nonparametric triangular 
simultaneous equation model discussed in Newey, Powell and Vella (1999), 
which is of the form 

(3.3) Y = fiiiXi,Zi) + U, 

(3.4) Xi = f,2{Zi,Z2) + V. 

Here the interest is in estimating the function fii. To achieve identification, 
one imposes the restrictions ¥.{V\Zi, Z2) = 0, M{U) = and E,{U\Zi, Z2, V) = 
K{U\V), which follow, for example, if the vector of exogenous covariates and 
instruments Z = (^1,^2) is jointly independent of the disturbances {U,V). 
Now let m{xi, zi,v) = ¥,{Y\Xi = xi,Zi = zi,V = v). Under the above as- 
sumptions, it is straightforward to show that 

m{xi,zi,v) = fii{xi,zi) + A(f), 

where A(f ) = E(C/|1/ = v). The first component of this additive model could, 
for example, be estimated by marginal integration [Newey (1994a), Linton 
and Nielsen (1995)], which relies on the fact that 

(3.5) j m{xi,zi,v)fviv)dv = fii{xi,zi), 

where fv is the probability density function of V. Implementing a sample 
version of (3.5) requires estimating the function m. Since the residuals V 
are not directly observed but must be estimated by some nonparametric 
method, this fits into our framework with {Y,S) = (Y, {Xi, Zi, Z2),Xi) and 
ro{S) = {Xi,Zi,Xi-fi2{Zi,Z2)). 

Remark 1. An alternative to marginal integration would be an ap- 
proach based on smooth backfitting [Mammen, Linton and Nielsen (1999)]. 
Smooth backfitting estimators avoid several problems encountered by mar- 
ginal integration in case of covariates with moderate or high dimension, but 
involves a more involved statistical analysis which is beyond the scope of 
the present paper. We are going to study smooth backfitting with nonpara- 
metrically generated covariates in a separate paper. 

3.4. Generalized Roy model. D'Hautfoeuille and Maurel (2009) consider 
a generalized Roy model of occupational choice that is related to the previous 
example in the sense that it also leads to an additive regression model. Let Y^ 
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denote the individual's potential earnings in sector k G {0, 1} of an economy, 
X = {Xq,Xi,Xc) a vector of covariates, and assume that E(yfc|X, r/i, r/2) = 
ipk{Xk, Xc) + rjk, where (j^Oi^i) are sector-specific productivity terms known 
by the agent but unobserved by the analyst. Expected utility from work- 
ing in sector k is assumed to be Uk = E(yfc|X, 771, 7/2) + Gk{X), the sum of 
sector-specific expected earnings and a nonpecuniary component that de- 
pends on X. Along with X, the analyst observes the chosen sector D, which 
satisfies D = l{Ui> Uq] , and the realized earnings Y = DYi + (1 — L')yo- 

One object of interest in this context is the pair of functions {ipijipo). 
Under some weak additional conditions, d'Haultfoeuille and Maurel (2009) 
show that 



for d£ {0, 1}, which is again an additive model involving unobserved covari- 
ates, namely the conditional probabilities Pr(D = d\X) of choosing sector d. 
This setting fits into our framework in the same way as the previous example. 

3.5. Nonparametric nonseparable triangular simultaneous equation mod- 
els. Imbens and Newey (2009) consider a generalized version of the above- 
mentioned triangular simultaneous equation model with nonadditive distur- 
bances: 



Nonseparable models have become popular in the recent econometric lit- 
erature, as they allow for substantially more general forms of unobserved 
heterogeneity than specifications in which the disturbance terms enter ad- 
ditively. The focus here is typically on averages of the function fii , such as 
the average structural function, 



To achieve identification, assume that the function /i2 is strictly monotone 
in its last argument, that V is continuously distributed, and that the un- 
observed disturbances {U, V) are jointly independent of Z. Then it can be 
shown that U and {Xi , Zi ) are independently conditional on the so-called 
control variable W = Fx^^izi^i^Z), where Fx^\z denotes the distribution 
function of Xi given Z. Under an additional support condition, this result 
implies that the ASF is identified through the relationship 



where m{xi, zi,w) = K{Y\Xi =xi,Zi = zi,W = w). Since the control vari- 
able W is unobserved and has to be estimated in order to implement a sam- 
ple analog estimator of (3.8), this setting also fits into the framework of 



EiY\D = d,X) = M^d, X,) + Arf(Pr(Z) = d\X)) 



(3.6) 
(3.7) 



y = /ii(Xi,Zi,[/), 

Xi=^2{ZuZ2.V). 



ASF(3;i,zi) =E[/(/Ui(xi,zi,[/)). 



(3.8) 
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this paper. In particular, nonparametric estimation of m is covered with 
{Y,S) = {Y,{Xi,Zi,Z2),Xi) and ro{S) = {Xi,Zi,Fx,iz{Xi,Z)). 

4. Asymptotic properties. It is straightforward to show that riiLL con- 
sistently estimates the function mo under standard conditions. Obtaining 
refined asymptotic properties, however, requires more involved arguments. 
In this section, we derive a stochastic expansion of the difference between the 
real and the oracle estimator, in which the leading terms are kernel- weighted 
averages of the first stage estimation error. This is our main result. It can 
be used, for example, to obtain uniform rates of consistency for the real 
estimator, or to prove its asymptotic normality. We demonstrate this in the 
next section for specific forms of ro and f. 

Throughout this section, we use the notation that for any vector a G 
the value amin = mini<j<(iaj denotes the smallest of its elements, a+ = 

Yl'j=i denotes the sum of its elements, a_k = («!,..., afe-i, flfc+i, • • • , ad) 
denotes the d — 1-dimensional subvector of a with the A;th element removed 
and a* = {a'l ■, ■ ■ ■ ,0!'/) for any vector 6 G R'^. For ease of presentation in the 
following, we avoid logarithmic terms in rates of convergence; that is, we 
state assumptions and results in the form op(n^) instead of Op (log n"^) with 

e,7>o. 

4.1. Assumptions. In order to analyze the asymptotic properties of the 
local linear estimator with nonpar ametrically generated regressors, we make 
the following assumptions. 

Assumption 1 (Regularity conditions). We assume the following prop- 
erties for the data distribution, the bandwidth, and kernel function /C: 

(i) The sample observations (Yi,Si) are i.i.d. 

(ii) The random vector R = rQ{S) is continuously distributed with com- 
pact support /p. Its density function /jj is twice continuously differentiable 
and bounded away from zero on 

(iii) The function mo is twice continuously differentiable on /p. 

(iv) £'[exp(/|e|)|S'] < C almost surely for a constant C > and I > small 
enough. 

(v) The kernel function /C is a twice continuously differentiable, symmet- 
ric density function with compact support, say [—1,1]. 

(vi) The bandwidths h = {hi, . . . , hd) satisfies hj ~ n~'^^ for j = 1, . . . ,d 
and 77+ < 1. 

Most conditions in Assumption 1 are standard regularity and smoothness 
conditions for kernel-type nonparametric regression, with the exception of 
Assumption l(iv). The subexponential tails of e conditional on S assumed 
there are needed to apply certain results from empirical process theory in 
our proofs. Such a condition is not very restrictive though. 
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Assumption 2 (Accuracy). The components rj and r^j of f and ro, 
respectively, satisfy 

sup|fj(s) -roj{s)\ =op{n~^i) 

s 

for some 5j > rjj and all j = 1, . . . , d. 

Assumption 2 is a "high-level" restriction on the accuracy of the estima- 
tor f. It requires each component of the estimate of the function tq to be 
uniformly consistent, converging at rate at least as fast as the corresponding 
bandwidth in the second stage of the estimation procedure. This is typically 
not a restrictive condition, and it allows for estimators f that converge at 
a rate slower than the oracle estimator m^^. Uniform rates of consistency 
are widely available for all common nonparametric estimators; see, for ex- 
ample, Masry (1996) for results on the Nadaraya- Watson, local linear and 
local polynomial estimators, or Newey (1997) for series estimators. 

Assumption 3 (Complexity). There exist sequences of sets Mn,j such 
that: 

(i) Pr(fj G A^n,j) — )• 1 as n — >• (X) for all j = 1, . . . , d. 

(ii) For a constant C^f > and a function r„j with — rojUoo = 
o{n~^i), the set M.nj = -^nj H {rj : — r„j||oo < n~^^} can be covered by 
at most Cm exp(A~"-'n^J ) balls with || • ||oo-radius A for all A < n~^^ , where 
< < 2, G M and || • ||oo denotes the supremum norm. 

Assumption 3 requires the first-stage estimator r to take values in a func- 
tion space Mn,j that is not too complex, with probability approaching 1. 
Here the complexity of the function space is measured by the cardinality 
of the covering sets. This is a typical requirement for many results from 
empirical process theory; see van der Vaart and Wellner (1996). The sec- 
ond part of Assumption 3 is typically fulfilled under suitable smoothness 
restrictions. For example, suppose that Mn,j is the set of functions defined 
on some compact set 1$ C W whose partial derivatives up to order k exist 
and are uniformly bounded by some multiple of n^i for some > 0. Then 
Assumption 3(ii) holds with aj =p/k and = S*Cij [van der Vaart and 
Wellner (1996), Corollary 2.7.2]. For kernel-based estimators of rp, one can 
then verify part (i) of Assumption 3 by explicitly calculating the derivatives. 
Consider, for example, the one-dimensional Nadaraya— Watson estimator T^^j 
with bandwidth of order n~^/^. Choose r„j equal to roj plus asymptotic 
bias term. Then one can check that the second derivative of ^n,j — 
absolutely bounded by Op{\/\ogn) = op{n^j) for all ^* > 0. For sieve and 
orthogonal series estimators. Assumption 3(i) immediately holds when the 
set Mn,j is chosen as the sieve set or as a subset of the linear span of an in- 
creasing number of basis functions, respectively. For a discussion of entropy 
bounds and further references, we refer to van de Geer (2000). 
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Assumption 4 (Continuity). For any r G Mn = Mn,i x • • • x Mn4 the 
conditional expectation T^{x,r) = 'E{p{S)\r{S) = x) with p{S) =E(y|5) — 
E(y|ro(S')) exists and is twice differentiable with respect to its first argu- 
ment, with derivatives that are uniformly bounded in absolute value, and 
satisfies 

\\T^{x,ri) - T^{x,r2)\\ < C^lln - r2||oo a.s. 
for all ri,r2 G Ain and a constant > 0. 

Assumption 4 imposes certain smoothness restrictions on the conditional 
expectation of p{S). The term p{S) can be thought of as capturing the in- 
fluence of the underlying covariates S on the outcome variable Y that is 
not excreted through the "index" ro(S). In certain applications, the "in- 
dex" ro(S') is a sufficient statistic for the function mo, and thus p{S) = 
with probability 1. In this case. Assumption 4 is trivially satisfied. Note that 
p{S) = E(e|S'), and that r'^(-, tq) = by construction. 

4.2. The key stochastic expansion. With the assumptions given in the 
previous section, we are now ready to state our main result, which is a stochas- 
tic expansion of the real estimator rhLL{x) around the oracle estimator 
iTiLLix)- Our aim is to derive an explicit characterization of the influence of 
the presence of generated regressors on the final estimator of the function 
niQ. To this end, we define w{x,r) = (1, (ri(5) — xi)/hi, . . . , {rd{S) — X(i)/h(i), 
and set Nh{x) = 'E{w{x,r)w{x,r)'^ Kh{r{S) — x)). Next, we define 

A(x,r) = ejNhixy'EiKhiroiS) - r)(r(S) - ro(5))), 

r(x,r) = ejNhixr'nKiroiS) - r)(r(5) - roiS))piS)) 

for any r G Mn, where K'^{u) = (/C^ ■{u):j = l,..., d)""" is a vector with ele- 
ments K,'^ j{u) = K,' {uj /hj)/ /i| rij* ^ (^i* /hj*)/hj*. Finally, we put A {x) = 

A(a;,r) and T{x) = T{x,r). With this notation, we can now state our main 
theorem. 

Theorem 1. Suppose Assumptions 1~4 hold. Then 

sup {riiLLix) - miLix) + mo(x)A(x) - r(x)| = Op{n~'^), 

where k = min{Ki , . . . , K3} with 

Z A l<j<d 

K.2 < 2r/mm + {S - r?)min, 
1^3 ^ "mill 
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The two leading terms in our stochastic expansion of the real estima- 
tor mii(x) around the oracle estimator rhLL{x), which are accounting for 
the presence of generated covariates, are both smoothed versions of the first- 
stage estimation error f(s) — ro(s). To see this more clearly, note that it 
follows from standard arguments for local polynomial smoothing that 



uniformly over x £ I]^^ = {x £ Ir: the support of {- — x) is a subset of . 
In order to achieve a certain rate of convergence for the real estimator, it 
is thus not necessary to have an estimator of rg that converges with the 
same rate or a faster one, since the asymptotic properties of the estimator 
using nonparametrically generated regressors only depend on a smoothed 
version of the first-stage estimation error. While smoothing does not affect 
the order of the estimator's deterministic part, it typically reduces the vari- 
ance and thus allows for less precise first-stage estimators. Note that the 
first adjustment term is negligible in regions where the regression function 
is flat, since niQ^x) = in this case. Conversely, the impact of generated co- 
variates is accentuated when the true regression function is steep. Also note 
that r(x) = when E(e|S') = 0, as the latter implies that p{s) = 0. This is 
a natural condition in certain empirical applications. 

Remark 2. In Theorem 1 no assumptions are made about the process 
generating the data for estimation of rg. In particular, nothing is assumed 
about dependencies between the errors in the pilot estimation and the re- 
gression errors e^. We conjecture that better rates than n~'^ can be proven 
under such additional assumptions, but the results would only be specific 
to the respective full model under consideration. One way to extend our 
approach to such a setting would be to use our empirical process methods 
to bound the remainder term of higher order differences between m and rh, 
and to treat the leading terms of the resulting higher order expansion by 
other, more direct methods. 

5. Examples revisited. In this section, we apply our high-level results 
from Section 4 to some of the motivating examples presented in Section 3, 
which are representative for the others in terms of employed techniques. As- 
suming a specific nature of the function ro and a specific method to estimate 
it, explicit forms of the adjustment terms A{x) and r(x) in Theorem 1 can 
be derived in order to account for the presence of generated covariates. Our 
focus in this section is on the practically most important case that tq is the 
conditional mean function in an auxiliary nonparametric regression. Many 
other applications can be treated along the same lines. 




E{Kh{ro{S) - x)ir{S) - ro{S))) 



and 




E{K'^{ro{S)-xy{r{S)-ro{S))p{S,)) 
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5.1. Generic example: Two-stage nonparametric regression. The main 
setting in which we iUustrate the apphcation of the stochastic expansion 
from Theorem 1 is the "two-stage" nonparametric regression model given by 

Y = mo{ro{S)) + e, 

T = ro{S) + C, 

where C is an unobserved error term that satisfies ^[CIS] = £'[e|ro(S')] = 0. 
For simphcity, we focus on the case that R = ro(S') is a one-dimensional co- 
variate, but generalizations to multiple generated covariates or the presence 
of additional observed covariates are immediate. 

Our strategy for deriving asymptotic properties of rhiL in this framework 
is to first provide an explicit representation for the adjustment terms A(x) 
and T{x) from Theorem 1, which are then combined with standard results 
about the oracle estimator itill- For this approach it is convenient to use 
a kernel-based smoother to estimate tq. Since the bias of both A{x) and T{x) 
is of the same order as of this first-stage estimator, we propose to estimate 
the function tq via qth order local polynomial smoothing, which includes 
the local linear estimator as the special case q = 1. Formally, the estimator 
is given by f(s) = a, where 

(5.1) (a,/3) = argminj^ r,-a- ^ /3j(5_ ^y. Lg{Si - s) 

i=l ^ l<u+<q ^ 

and Lg{s) =Y\'j=i^{sj/g)/g is a p-dimensional product kernel built from 
the univariate kernel C, g is a vector of bandwidths, whose components are 
assumed to be the same for simplicity, and X]i<n+<g denotes the summation 
over all u = {ui, . . . , Up) with 1 < ti+ < <?. When tq is sufficiently smooth, the 
asymptotic bias of local polynomial estimators of order q is well known to be 
0{g'^~^^) uniformly over x ^ Ir (if q is uneven), and can thus be controlled. 
A further technical advantage of using local polynomials is that the cor- 
responding estimator admits a certain stochastic expansion under general 
conditions, which is useful for our proofs. We make the following assumption, 
which is essentially analogous to Assumption 1, except for Assumption 4(iii). 
This additional assumption requires higher order smoothness of the kernel, 
necessary to bound the kt\i derivative of the estimator f. This allows us to 
verify Complexity Assumption 3 for f. 

Assumption 5. We assume the following properties for the data distri- 
bution, the bandwidth and kernel function C: 

(i) The observations (5i,li,Tj) are i.i.d., and the random vector S is 
continuously distributed with compact support !$■ Its density function fs 
is bounded and bounded away from zero on Ig. It is also differentiable with 
a bounded derivative. The residuals Q satisfy < oo for some e > 2. 
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(ii) The function ro is g + 1 times continuously differentiable on Is ■ 

(iii) The kernel function C is a A;-times continuously differentiable, sym- 
metric density function with compact support, say [—1,1], for some natural 
number k > max{2,j>/2}. 

(iv) The bandwidth satisfies g ~ for some < 9 < 1/p. 

To simplify the presentation, we also assume that the function ro(s) is 
strictly monotone in at least one of its arguments, which can be taken to 
be the last one without loss of generality. This assumption could be easily 
removed at the cost of a substantially more involved notation in the following 
results. 

Assumption 6. The function ro(u_p,tip) is strictly monotone in Up, 
and we have that rQ{u-p, ip{u-p, x)) = x for some twice continuously differ- 
entiable function ip. 

The following proposition shows that in the present context the function 
A(x) can be written as the sum of a smoothed version of the first stage esti- 
mator's bias function, a kernel-weighted average of the first-stage residuals 
^1, . . . , Cm and some higher order remainder terms. For a concise presentation 
of the result, we introduce some particular kernel functions. Let L* denote 
the p-dimensional equivalent kernel of the local polynomial regression es- 
timator, given in (A. 27) in the Appendix, and define the one-dimensional 
kernel functions 

Jhix, s) = j Kh{ro{s) - X- dsro{s)uh)L*{u) du, 

Then, with this notation, we obtain the following proposition. 

Proposition 1. Suppose that Assumptions 1 and 4^6 hold. Then we 
have for the correction factor A in Theorem 1 that 

'\og{ny 



sup |A(x) - Aa(x) - Ab(2;)| = Op, 



where the terms Aa(x) and A.b{x) satisfy 



1 /"^ 

sup |A^(x)| = Op((log(n)/(7T.max{(7,/i})) and 
snp\AB{x)\=Op{g'>+'). 

Moreover, uniformly over x G it is Ab{x) = g'^^^ E[b{S)\rQ{S) = x] + 

0p{g'^~^^) with a bounded function b{s) given in (A. 25) in the Appendix, and 
the term Aa(x) allows for the following expansions uniformly over x £ I^^, 
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depending on the limit of g/h: 
(a) If g/h ^0, then 



h J \ nh 



1 / / 2 

(b) If h = g, then 

(c) If g/h ^ CO, then 



1/2 /I / \ \ 1/2 



n 



It should be emphasized that in all three cases of the above proposition 
the leading term in the expression for Ayi(x) is equal to an average of the 
error terms Q weighted by a one- dimensional kernel function, irrespective 
of p = dim(S). The dimension of the covariates thus affects the properties 
of A(x) only through higher-order terms. Furthermore, it should be noted 
that one can also derive expressions of A(a;) similar to the ones above for 
values of x close to the boundary of the support. Likewise these take the 
form of a one-dimensional kernel weighted average of the error terms Qi plus 
a higher-order term. The corresponding kernel function, however, has a more 
complicated closed form varying with the point of evaluation. 

The following proposition establishes a result similar to Proposition 1 
for the second adjustment term T{x). We again introduce a particular one- 
dimensional kernel function, defined as 

H^{x,v) = j g^^L*(^s^p, V{v-p,x) — ^ _^ ^^^^ ds\{;v^p,x) 

with 

_ ip{v-p, ^{v-p, x)fs{v-p, ^{v-p, x)) det{dy^^ip{v-p, x)) 
fs{v-p,ip{v-p,x))dyj^ro{v-p,ip{v-p,x)) 

where L* still denotes the p-dimensional equivalent kernel of the local poly- 
nomial regression estimator, given in (A. 27) in the Appendix. 

Proposition 2. Suppose that Assumptions 1 and 4^6 hold. Then we 
have that 



sup |r(x) - TAix) - TBix)\ = Op 



x&Ir V ^9- 



log(n) 
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where the terms TAi^) and Tb{x) satisfy 

sup \rA{x)\ = Op{{login)/{ng))'/^) and sup |f = Op{g'^+^). 
x&Ir x&Ir 

Moreover, uniformly over x G it is T b{x) = g'^'^^ dxE[b{S)p{S)\rQ{S) = 
x] + Op{g'^~^^) with a bounded function 5(s) given in (A. 25) in the Appendix, 
and the term Ta{x) allows for the following expansion uniformly over x £ 



(5.2) rix) = ——J2H^{^,S,)Q+op[ 

Again, the leading term in the expression for Ta{x) is equal to an average 
of the error terms Ci weighted by a one- dimensional kernel function, and thus 
behaves similarly to one-dimensional nonparametric regression estimator. 
A similar result could be established for regions close to the boundary of 
the support. Note that in contrast to Proposition 1, the details of the result 
in Proposition 2 do not depend on the relative magnitude of the bandwidths 
used in the first and second stage of the estimation procedure. 

Combining Theorem 1 and Propositions 1-2 with well-known results about 
the oracle estimator itill , various asymptotic properties of the real estima- 
tor TTiLL can be derived. In the following corollaries we present results for 
the most relevant scenarios, addressing uniform rates of consistency and 
stochastic expansions of order op(n~^/^) for proving pointwise asymptotic 
normality. More refined expansions of higher orders such as op(n~^/^), which 
are useful for the analysis of semiparametric problems in which tuq plays 
the role of an infinite dimensional nuisance parameter [e.g., Newey (1994b), 
Andrews (1994), Chen, Linton and Van Keilegom (2003)], would also be 
possible. We do not present such results here as they would require strong 
smoothness restrictions that are unattractive in applications. See Mammen, 
Rothe and Schienle (2011) for an alternative approach to controlling the 
influence of generated covariates in semiparametric models. 

Starting with considering the uniform rate of consistency, it is well known 
[Masry (1996)] that under Assumption 1 the oracle estimator satisfies 

sup |?71ll(x) — m(x)| = Op((log(n)/n/i)^^^ + h^). 

x&Ir 

This implies the following result. 

Corollary 1. Suppose that Assumptions 1, 4 and 5 hold. Then 
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Straightforward calculations show that, under appropriate smoothness 
restrictions, it is possible to recover the oracle rate for the real estimator 
given suitable choice of rj and 9, even if the first-stage estimator converges 
at a strictly slower rate. Note that the rate in Corollary 1 improves upon 
a bound on the uniform rate of convergence of a two-stage regression esti- 
mator derived in Ahn (1995) for a similar setting. 

Next, we derive stochastic expansions of itill of order op{n~'^^^) for the 
case that rj = 1/5. Such expansions immediately imply results on pointwise 
asymptotic normality of the real estimator. We start with the case that 
6 = rj, in which the stochastic terms Ta{x) and Aa{x) are of the same order 
of magnitude (other bandwidth choices will be discussed below). During the 
analysis of this setting, it becomes clear that applying Theorem 1 requires 
p9 < 3/10. Thus in order to use the expansion in Proposition 1(b), only p = l 
is admissible; that is, S must be one-dimensional for the choice 9 = rj to he 
feasible. In this setting, the notation for the kernel functions appearing in 
the stochastic expansions can be somewhat simplified. We define 

J{v,x) = j K{v-r'Q{r^^{x))u)L*{u) du, 

where 



{v,x) = j L*{v + sdxr^\x))dsX{x) 
^^■^_Up{roHx))fs{r^Hx))) 



fs{r^\x)y,{r-\x)) ' 

where r^^ is the inverse function of tq, which exists by Assumption 6. 

Corollary 2. Suppose that Assumptions 1 and 4~6 hold with rj = 9 = 
1 /5 and p = q = 1 . Then the following expansions hold uniformly over x G 

^R,n '■ 

rhLL{x) -mo{x) 

1 " 

= —TT^yZ^hiroiSi) - x)£i 

1 " _ 

rr^yZ^^o^^^'^hiroiSi) -x,x)- H^iSi - r^^{x),x))Ci 

njR[x) ^ 

+ i/?(x)/i2 + Op(n-2/5), 
where the bias is given by 
p{x) = [ u^K{u)duml{x) 

u^L{u)du{r'^{r^\x))m',{x)-dx[r'^{r^\x))p{r^Hx))]). 
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In particular, we have 



{nhy/\mLL{x) - mo{x) - f3{x)h^) A iV(0, C7^(x)), 

whereal^ix) = [Var(e|i? = x) / K{tf dt-2E{eQ\R = x) J K{t){J{t,x)m'^^{x)- 
H^{t,x))dt\ai{Q\R = x) J{m'Q{x)J{t,x)- H^{t,x))^ dt]/ fji{x) is the asymp- 



totic variance. 

Under the conditions of the corollary, the limiting distribution of miL^x) 
is generally affected by the pilot estimation step, although a qualitative de- 
scription of the impact seems difficult. Depending on the curvature of tuq 
and the covariance of e and C,, the asymptotic variance of the estimator 
using generated regressors can be bigger or smaller than that of the ora- 
cle estimator fiiLL- There thus exist settings where in practice it would be 
preferable to base inference on the real estimator even if one was actually 
able to compute the oracle estimator. 

The next corollary considers the case that > rj, and thus g/h^ 0. Again, 
applying Theorem 1 requires p6 < 3/10 in this setting, and thus only p = 1 is 
admissible when using Proposition 1(a) for such a choice of bandwidths. The 
corollary also focuses on the special case that p{S) := E(y|i?) — E(y|5) = 0, 
which implies that r(x) = with probability 1. This condition is satisfied 
for certain empirical applications, such as, for example, models IV models. 
Without this additional restriction, an expansion of the difference mLL(x) — 
mo{x) would be dominated by the term Ta{x), which is Op{{log{n) /{ng))^/"^) 
and thus converges at a slower rate than the oracle estimator. 

Corollary 3. Suppose that Assumptions 1, 4 cmd 5 hold with r] = 1/5, 
1/5 <9 < 3/10 and p = q = l, and that p{S) = with probability 1. Then the 
following expansion holds uniformly over x £ I^^: 



where (j'^ix) = Var(e — mQ{R)(\R = x) J K {t)^ dt / f ii{x) is the asymptotic 
variance. 

The limiting distribution of rhii{x) is again affected by the use of gener- 
ated covariates under the conditions of the corollary. In this particular case, 
the form of the asymptotic variance has an intuitive interpretation: the es- 





In particular, we have 
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timator rhLL{x) has the same Hmiting distribution as the local linear oracle 
estimator in the hypothetical regression model 

y = mo(ro(5)) + e*, 

where e* = e — mQ(ro(S'))C. As in Corollary 2 above, depending on the cur- 
vature of niQ and the covariance of e and the asymptotic variance of the 
estimator using generated regressors can be bigger or smaller than that of 
the oracle estimator rriLL. 

The next corollary discusses the case when 6 < r]. For such a choice of 
bandwidth, applying Theorem 1 requires no restrictions on the dimensional- 
ity of S. It turns out that in this case mLL{x) = "rnLLix) + Op{n~'^/^) , and thus 
the limit distribution of itill is the same as for the oracle estimator rfiLL. 
The effect exerted by the presence of nonparametrically generated regressors 
is thus first-order asymptotically negligible for conducting inference on tuq 
in this case. 

Corollary 4. Suppose that Assumptions 1, 4 CLnd 5 hold with 6 <rj = 
1/5. Then the following expansion holds uniformly over x ^ ^/ |(9 + 

1 " 

rhLL{x) -mo{x) = Kh{ro{Si) - x)ei 

nfR{x) ^ 

+ l/i2 J u'^K{u) dum^ix) + Op(n-2/5). 

In particular, we have 

{nhfl^ (rhLdx) - mo(x) - j u^K{u) dnm'o'(x)^ A N{Q, c7^(x)), 

where o"^(x) = Var(e|i? = x) / K{t)^ dt/ fji(x) is the asymptotic variance. 

5.2. Nonparametric censored regression. Consider estimation of the cen- 
sored regression model in (3.1). Let f{x) be the qth order local polynomial 
estimator of the conditional mean ro(x) = E(y|X = x), and let q{r) be the 
local linear estimator of (?o(^) using the generated covariates r(Xi). Then an 
estimate of /xq is given by 

(5.3) fi{x) = X+ / ——du, 

Jf{x) QW 

where the constant A is chosen large enough to satisfy A > maxi=i^. ^„ f(Xj) 
with probability tending to one. Generalizing Lewbel and Linton (2002), 
we consider the use of higher-order local polynomials for the first stage 
estimator, and allow the bandwidth used for the computation of f and q 
to be different. For presenting the asymptotic properties of fi, let so{x) = 
E(I{y > 0}|X = x) be the proportion of uncensored observations conditional 
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on X = and assume that this function is continuously differentiable and 
bounded away from zero on the support of X. We then obtain the fohowing 
result. 

Corollary 5. Suppose that Assumptions 1 and 5 hold with {Y,S,T) = 
{I{Y > Q},X,Y) and R = ro(5) = ro{X). Furthermore, suppose that 6 £ 
(0,6) where 9 and 9 are constants depending on rj, q and p as follows: 

- 1-37? ri-4r? 1 

— and 9 = max< 



p \ P '2(g+l)+p 

Under these conditions, we have that 

V^{fi{x)-M^))AN(o, ^ f^'^l , / L{tfdt 



fs{x)s^Q{x) 



where a'^.{x) = Var(y|X = x). 



The corollary is analogous to Theorem 5 in Lewbel and Linton (2002). 
However, using our results, substantially simplifies the proof and provides 
insights on admissible choices of band widths. Note that the lower bound 9 is 
chosen such that both the bias of f and q tends to zero at a rate faster than 
{ng^)~^/'^ . Due to this undersmoothing, the limiting distribution of /i — is 
centered at zero. Note that the final estimator converges at the same rate 
as the generated regressors. This is due to the fact that the function f is 
not only used to compute q, but also determines the limits of integration 
in (5.3). The "direct" influence of the generated regressors in the estimation 
of q is asymptotically negligible in this particular application. 

5.3. Nonparametric triangular simultaneous equation models. Now con- 
sider nonparametric estimation of the structural function ni in the triangular 
simultaneous equation model (3.3)-(3.4) using a marginal integration esti- 
mator. In order to keep the notation simple, we restrict our attention to the 
arguably most relevant case with a single endogenous regressor, but allow 
for an arbitrary number of exogenous regressors and instruments. Let jj,2{z) 
be the gth order local polynomial estimator of ^2{z) = E(Xi|Z = z), and 
let m{xi,zi,v) be the local linear estimator of m{xi,zi,v) = E(y|Xi = 
xi, Zi = zi,V = v). The latter is computed using the generated covariates 

Vi = Xii — fi2{Zi) instead of the true residuals Vi from equation (3.4). For 
simplicity, we use the same bandwidth for all components of m; that is, we 
put r]j = 1] for all j = 1, . . . , (2 -|- di). The marginal integration estimator of 
fii{xi,zi) is then given by the following sample version of (3.5): 

1 " 

(5.4) jli{xi,zi) = -y^rh{xi,zi,Vi). 

1=1 

The following result establishes the estimator's asymptotic normality. 
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Corollary 6. Suppose that Assumption 1 holds with {Y, S, T) = (Y, {Xi, 
Zi, Z2), Xi) and R = rQ{S) = {Xi, Zi, Xi — fi2iZi, Z2)) , and that Assump- 
tion 5 holds with ro(5) = /U2(^i, Z2). Furthermore, suppose that rj G (max{l/ 
(5 + (ii),l/(2p + 3)},l/(l and that 6 e {9,6), where 9 and 9 are con- 

stants depending on rj, q and dj = dim(Zj) as follows: 

9 = and 9= ^ — , 

2p - 2{q+l) 

where p = di -\- d2- Under these conditions, we have that 

^/^^^(/il(xl,^l)-/il(xl,Zl))4Affo,Ef-^i^^^^^^ / k{tfdt], 

V \Ixz\v{xi,zi,V) J J ) 

where k{t) = )C{ti) is a (1 + di)- dimensional product kernel, and 

a'^{xi, zi,v) = Var(y — m{R)\R = {xi,zi,v)). 

Under the conditions of the corohary, the asymptotic variance of zi) 
is not influenced by the presence of generated regressors: If m was replaced 
in (5.4) with an oracle estimator rh using the actual disturbances Vi in- 
stead of the reconstructed ones, the result would not change. Also, note that 
the exclusion restrictions on the instruments imply that E(y|Xi,Zi,y) = 
E(y|Xi, Zi, Z2). Therefore Assumption 4 is automatically satisfied, and the 
adjustment term T{x) from Theorem 1 is equal to zero and does not have 
to be considered for the proof. 

6. Conclusions. In this paper, we analyze the properties of nonparamet- 
ric estimators of a regression function, when some the covariates are not 
directly observable, but have been estimated by a nonparametric first-stage 
procedure. We derive a stochastic expansion showing that the presence of 
generated regressors affects the limit behavior of the estimator only through 
a smoothed version of the first-stage estimation error. We apply our results 
to a number of practically relevant statistical applications. 

APPENDIX: PROOFS 

Throughout the Appendix, C and c denote generic constants chosen suf- 
ficiently large or sufficiently small, respectively, which may have different 
values at each appearance. Furthermore, define = Mn,i x • • • x Mn,d- 

A.l. Proof of Theorem 1. In order to prove the statement of the theo- 
rem, we have to introduce some notation. Throughout the proof of this and 
the following statements, we denote the unit vector (1,0, ...,0)^ in 
by ei. We also write Wi{x,r) = (l,(ri(5i) - xi) /hi, . . . ,{rd{Si) -Xd)/hd), 
and put Wi{x) = Wi{x, tq), Wi{x) = Wi{x, f) and Wi{x) = Wi{x, r). We also de- 
fine Mh{x,r) = n~'^YJi=iWi{x,r)wi{x,rY Kh{r{Si) - x), and put Mhix) = 
Mhix,ro), Mh{x) = Mhix,f) and Mh{x) = Mhix,f) and set N^ix) = E{Mh{x, 
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ro)). Furthermore, define e* = e — p{S) and note that we have E(e*|5') = 
by construction. It also holds that 

Yi = mo{ro{Si)) + e* + p{S^). 

Next, it follows from standard calculations that the real estimator itill can 
be written as 

mLL{x) = mo{x) + mLLM^) + mLL,B{x) + mLL,c{x) + mLL,D{x) + mLL,E{x), 
where rhLL,j{x) = aj for j G {A, B, C, D, £'}, and 

n 

(q;a,/3a) = argmin ^^(e* - a- jS^ {r{Si) - x)fKh{r{Si) - x), 

n 

{aB^Ps) = argminV'(mo(ro(5j)) - mo(x) - mo(x)'^(ro(5'i) - x) 

-a-p'^{f{Si)-x)f 

xKh{r{Sr)-x), 

n 

(ac,/3c) = argminV'(-mo(x)'^(f(S'i) -ro(5'i)) - a - {f{Si) - x)f 
xKh{f{S,)-x), 

n 

{aD-iPo) = argminy^(?Tio(x)^(r(5i) - x) - a - 13^ {f {Si) - x)f 
xKh{f{Si)-x), 

n 

[che^Pe) = aigminS^ {p{Si) - a- {r{Si) - x)fKh{r{Si) - x). 

Similarly, the oracle estimator mLL can be represented as 

rfiLLix) = mo{x) + mLLM^) + rhLL,B{x) + mLL,c{x) + mLL,Dix) + mLL,Eix), 

where rfiLLjix) = aj for j G {A, B, C, D, E}, and 

n 

{aA-,PA) = argmin^(ei -a- 0^ {ro{Si) - x)fKh{ro{Si) - x), 



a,l3 



i=l 



{aB,PB) = argmin V'(?no(ro(S'j)) - mo{x) - niQ{x)'^ (roiSi) - x) 



a-p'^{roiSi)-x)f 



X KhiroiSi) -x), 
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(ac,/3c) =argminV'(-mo(x)^(f(S'i) -ro(5i)) - a - {ro{Si) - x)f 
X KhiroiSi) -x) 

n 

iao^Po) = argmin V'(mo(x)'^(ro(5j) - x) - a - /3^(ro(5i) - x)f 
X A'/,(ro(S'i) -x). 

n 

ioiE,PE) = argmin ^^(^(^i) - a- /3"'"(r(5j) - x)fKhiriSi) - x). 
Note that by construction, 

(A.l) mLL,Dix) ^ThLL,Dix) ^0. 

We now argue that 

(A.2) sup \rhLL,Ai^) - rriLL,Ai^)\ = Opin~'^^). 

For a proof of (A.2) note that rhLL,Aix) and fhLL,Aix) are given by the first 
elements of the vectors M(x)~-'^n~^ ^"^-^ -ftr/i(f(S'j) — x)ejt()i(x) andM(x)~^ x 
n~^'^^^i KhiroiSi) — x)eiWiix), respectively. Using these representations, 
one sees that (A.2) follows from Lemmas 1 and 2 below. 
As a second step, we now show that 

(A.3) sup \mLL,Ei^) - rhLL,Ei^) - f (x)| = Opin'"^ + n'"^ + n''^''). 

To prove (A.3), put /i(x) = ^YJi=i^hiriSi) - x)wiix)piSi) and /x(x) = 
^Ya=i KhiroiSi) - x)wiix)piSi), and write G(x) = eJiNhix))~^Eifiix) - 

fj-ix)). With this notation, rhiL^Eix) = ej Mhix)~^ fiix) and rhiL^Eix) = 
e]" X Mhix)~^ pix) . Using Lemma 4 and some results of Lemma 3, we then 
find that 

rhLL,Eix) - rhLL,Eix) - G(x) 

= eJiMhix)-^m - Mhix)-^pix) - nMhix))~^nKx) - Pix))) 

= C)p(n-«i/2)(i-'?+)+('5-'')-n) + ^-{(i/2)(i-r?+)+5„,i„) ^ ^ Op(n-''i) 

uniformly over x £ Ir. Using standard smoothing arguments, we also get 
that 

Gix)=ejNhix)~^Eifiix)-fiix)) 

= r ! N / iKhiriu) -x) -Khiroiu) - x)) piu) f siu) dx du 
JRix) J 

+ Op(n"^''""""(''"'')'"'") 
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1 



Kl{ro{u) -x){f{u) -ro{u))p{u)fs{u)dxdu 



fR{x) 

= f (x) + Op(n-^2) + Op(n-''3) 

uniformly over x G /p. This shows the claim in (A. 3). 
Finally, from Lemmas 2 and 3 we get that 



(A.4) 
(A.5) 



sup \rhLL,B{x) - rhLL,B{x)\ = Op{n '^'^), 



sup \rhLL,c{x) - rhLLfiix)] = Op{n~ 



and it is easy to see that 

(A.6) sup \mLL,c{x) - m'Q{x)A{x)\ = Opin''^). 

Taken together, the results in (A.1)-(A.6) imply the statement of the theo- 
rem. 

Lemma 1. Suppose that the conditions of Theorem 1 hold. Then 



sup 



-. n 1 " 

- V Kh{ri{Si) - x)ei - - V Kh{r2{S^) - x)e. 
n ^-^ n ^-^ 



i=l 



i=l 



sup 

x&lR,ri,r2eMr. 



^JXjiSi)xj 



1 = 1 



-y^Kh{r2{Si)-x) 



hj 



i=l 



h 



■J 



Op{n-^'). 



Proof. We only prove the first statement of the lemma. The second 
claim can be shown using essentially the same arguments. Without loss of 
generality, we also assume that 

(A. 7) Ki>{6- r/)min. 

If Ki < (S — r7)min the statement of the lemma follows from a direct bound. 
For Ci , C2 > large enough (see below) we choose Cs such that 

Pr(max|ej| >Celog(n)) <n~*^^, 



(A.8) 
(A.9) 



V i 



|Ee^I{|e^| <Celog(n)}| <n-^^ 
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With this choice of we define 

Ai(ri,r2) = (KhiniSi) - x) - Kh{r2iSi) - x))e* 

with 

e* = eil{\ei\ < Ce, \og{n)} - E(eil{|e,| < Clog(n)}). 

For the proof of the lemma we apply a chaining argument; compare, for 
example, the proof of Theorem 9.1 in van de Geer (2000). Now for s > 0, let 
A^* „ j be a set of functions chosen such that for each r G Mri,j there exists 
r* e such that \\r - r*||oo < 2"*n~''-' . That is, the functions in A^s.nj 

are the midpoints of a (2~'^n~'^j)-covering of M.n,j- By Assumption 3, the 
set can be chosen such that its cardinality ^M.*^j is at most 

Cexp((2~^?i"^j)~"jn^j). Furthermore, define = x ••• x A^*^^. 

For ri,r2 € Mn we now choose rf,r| G -M-l^n such that ||r| ^ — rij||oo < 
2-Sj^-(5j ^]^(^ — ?'2,j||oo < C2~^n"^^ , for all j. We then consider the chain 

Gn Gn 

s=l s=l 

- A,(rf",ri) + A,(r^",r2), 

where G„ is the smallest integer that satisfies G„ > (1 + cg)(ki — {5 — 
^)min) log(n) / log(2) for a constant > 0. With this choice of G„, we obtain 
that for / = 1,2 



(A. 10) Ti 



1 " 



1=1 



< Clog(n)2-^"n-(^-'')™ < Cii'^'K 



Now for any a> cq define the constant Ca = (S^i 2 "'') ^. It then follows 
that 



Pr 



> Ca2-"^n' 



Gn / 1 " 

<5^Pr sup -^Mr-l\rl] 

s=l \rieM„ " 

Gn ( 1 

< E #-^:-i,n#-^:,n PM - E ^^K''' ^r'') > ^^^-"^ 

s=l V"" i=l 

Gn / , n 

s=l \ i=l 



■T2+T3 
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where the functions r^' , r^' £ A^^-i n ^■^d ' , fj^ ' G ■Ms,n chosen such 
that 



PrflX]A,(rr,rr)>c.2^ 



= max Frl-i2Mr!~\rl)>Ca2-'''n-^A, 
= max Prf -f"Ai(r'"\r'^)>Ca2-'''n-^A. 

rr\rl V^fe J 

We now show that both T2 and T3 tend to zero at an exponential rate: 
(A.ll) Ts <exp(-cn'=), 

(A.12) Tg <exp(-cn'=). 

We only show (A.ll), as the statement (A.12) follows by essentially the same 
arguments. Using Assumption 3, we obtain by application of the Markov 
inequality that 

Gn 

T2 < JJexp((2-"n-''^)-"^n^^) 

s=l j 

X E f exp f 7„,,- V Ai(r*•^ r**'") - jn,sCa2-'''n~^' 



(A.13) 



( exp( 7„,s- VAi(r^'^r;|' 
< C7^exp('^2"°^ n''^"^+«^ - -fn,sCa2-^'n-''A 
JJe^ exp| 7„,s-Ai(rJ''^rJ*'*) 



n 

X 

i=l 



\ n 



where jn,s = c^2(^~'^)''?i~''i+^~''++^(^~'')""" with a constant Cy > 0, smah 
enough. Now the last term on the right-hand side of (A.13) can be bounded 
as follows: 

E('exp('7n,.^A,(rt'^rr'^)))<l + CE(7tn-'A2(rr,rr)) 

(A.14) 
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where we have used that 



1 



1 



n 



< Clog(n)n('^-'')"-"-'^i2-''"+" 

<c 

for n large enough because of (A. 7). Inserting (A. 14) mto (A. 13), we obtain, 
if a and were chosen sufficiently small, that 

s=l ^ j ' 
<C^exp(-c"n'=) 

s=l 

< exp(— cn'^). 
Finally, it follows from a simple argument that 



(A. 15) r4 = Pr sup 

\ri,r2eMn 



i=l 



> n < exp(-cn'^ 



because the set A4q „ can always be chosen such that it contains only a single 
element. 

From (A.IO), (A.ll), (A.12) and (A.15), we thus obtain that 

1 " 
n ^-^ 



sup Pr sup 



(A.16) 



n 

n ^-^ 



>Cn-'^' <exp{-cn''). 



Now for Cj > choose a grid Iji^n of Ir with 0{'nP') points, such that for 
each X S Ir there exists a grid point x* = x*{x) G iR^n such that ||x — < 
j^-cCi _ j£ (j^ jg chosen large enough, this implies that 



(A. 17) sup sup 



-y^Kh{r{Si) - x)e^- - y^Kh{r{Si) - x*)e^ < n" 

for large enough n, with probability tending to one. Furthermore, it follows 
from (A.16) that 



(A. 18) sup sup 



-y^Kh{ri{S,)-x)ei--y^Kh{r2{S,)-x)ei 
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The statement of the lemma then follows from (A.8)~(A.9) and (A.17)- 
(A.18), if the constants Ci and C2 were chosen large enough. □ 

Lemma 2. Suppose that the conditions of Theorem 1 hold. Then 



sup 



i=l 
1 " 

-y^Kh{r2{Si) - x) 



TijiSi) -XjX" f ri^i{Si) - xi 



i=l 



r2,j {Si) -Xj^"- f r 2^i{Si) - xi 

h 



for jj = l,...,qj^l and < a + b <2, < a, 6. 

Proof. The lemma follows from 

sup\Kh{ri{s) -x)- Kh{r2{s) - x)\ < Cn-(''-'')™+^+ 

x,s 

for ri,r2 G Mn and from 

n 

-^K,ir{S,)-x) 



sup 



i=l 



< Cn~^+'^+ sup #{i : \roj{Si) - Xj\ < Cn-"^^ for j = l,...,d} 

x&Ir 

= Op(l), 

which follows from a simple calculation. □ 

Lemma 3. Suppose that the assumptions of Theorem 1 hold. For a ran- 
dom variable Rn = Op{\) that neither depends on x nor i, it holds that 



(A.19) 



sup |[mo(ro(S'i)) - "io(x) - mo(a;) {rQ{Si) - x)]Ii{x)\ 

x(^Ijl,l<i<n 



sup 



(A.20) 



1 " 

Kh{r{Si) - x)wi{x)wi{x)'^ 

1=1 
1 " 

- - y^Khiro{Si) - x)wi{x)wi{xY 



i=l 
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(A.21) 



sup 

x£Ir 



1 " 



i=l 



<Rn{n +n 



- x)wi{x)wi{x) 



fR{x)BK 



where Ii{x) = I{\\{f{Si) — x)/h\\i < 1} is an equals one if f(S'j) — x lies 
in the support of the kernel function Kh and zero otherwise, and Bk = 
diag(l,/ u'^K{u) du, . . . , f v?K{u)du) is a{d+l)x (d+l) diagonal matrix. 



Proof. Claim (A. 19) follows by a simple calculation. Claim (A. 20) is 
a direct consequence of Lemma 2, and (A.21) follows from standard argu- 
ments from kernel smoothing theory. For the stochastic part, one makes use 
of Lemma 5, given in Appendix A. 7, below. □ 

Lemma 4. Suppose that the assumptions of Theorem 1 hold. Then it 
holds that 

(A.22) sup _ \\^l{x, n) - ^i{x, ra) - E[/i(x, n) - rs)] || = Op(n-'^i), 

(A.23) sup \fi{x)\ = Op(yk^n^(^-^+)/2), 

x&Ir 

where 

n 

(ji[x) = n"^ ^ Kh{r{Si) - x)wi{x)p{Si) 
1=1 

and 

n 

H(x) = ^ Kh{ro{Si) - x)wi{x)p{Si). 

i=l 



Proof. For a proof of (A.22) one proceeds as in Lemma 1. Claim (A.23) 
follows by classical smoothing arguments. Note that we have that E(/i(x, 
ro)) = 0. □ 



A. 2. Proof of Proposition 1. In order to prove Proposition 1, we use the 
fact that the local polynomial estimator satisfies a certain uniform stochastic 
expansion if Assumption 4 holds. In order to present this result, we first 
have to introduce a substantial amount of further notation. For simplicity 
we assume gi = ■ ■ ■ = Qp, and we write g for this joint value and for the vector 
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Let Ni = {^~^'^^^) be the number of distinct g-tuples u with = i. Ar- 
range these g-tuples as a sequence in a lexicographical order (with the high- 
est priority given to the last position so that (0,...,0,i) is the first ele- 
ment in the sequence, and (i,0, ...,0) the last element). Let Tj denote this 
one-to-one mapping, that is, rj(l) = (0, . . . , 0, i), rj(A'j) = (i, . . . , 0). 
For each i = 1, . . . ,q, define a A'^j x 1 vector Hi{x) with its kth element 
given by x^^^^\ and write ^(x) = . . . which is a col- 

umn vector of length = Ylri=i ^i- -^^^ ^i — I L{u)u^ du and define Vniix) = 
J L{u)u^ fs{x + gu) du. For < j,k < q, let ^ and M„j- ^(x) be two Nj x 
Nk matrices with their (/,m) elements, respectively, given by 

[Mj^k]l,m = l^rj{0+Tfc(m) and [Mnj^kix)]l^m = '^n,Tj{l)+Tk(m)ix) ■ 



Now define the N x N matrices Mq and M, 



/Mo,o 



Mo,i 
Mil 



Mn,lfi{x) 



9,1 



Mr, 
Mr. 



n,q\X) by 
Ml,q 



Mq,J 



0,l(^) 

1,1(3;) 



Mr^,0,g{x)\ 
Mn,q,q{x)/ 



.,qfl{x) Mn,q,l{x) ■ 

Finally, denote the first unit g-vector by ei = (1, 0, . . . , 0). With this notation, 
it can be shown along classical lines that the local polynomial estimator f 
admits the following stochastic expansion: 



ris] 



1 " 

''o(^) + -'^^^^^nq{s)K{Si - s)/g)Lg{Si - s)Ci 



(A.24) 



where sup^g/^ \\Rn{s) 
satisfies 



1 



Op{{\-Og{n) / ng^Y^"^) , and Bn is a bias term that 
reiM-'Ajri'+'^s) + 0,(1) = b(a) + 0,(1). 



(A.25) B-M 

To prove the proposition, define the stochastic component and the bias term 
of the expansion (A.24) as f^(s) =n~^ 'Ylll=i^i-^'^nq{^)t^{{Si — s)/g)Lg{Si — s)C,i 

and rsis) = g'^~^^ Bn{s) , respectively. Now the function A can be written as 
A{x) = e^Nh{x)-'E{Kh{ro{S)-x)w{x,r)fA{S)) 

'log(n) 



+ ejNh{x)-'E{Kh{ro{S) - x)w{x, r)fB{S)) + O, 



ngP 
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= A,(x) + A,(.) + oii^), 

V ngP ) 

uniformly over x £ Ir. We first analyze the term Ab{x). Through the usual 
arguments from kernel smoothing theory, one can show for x £ I'^^ that 

As(x) = g'^+'ejNh{x)-'nKh{ro{S) - x)w{x,r)b{S)) + Op{g'^+') 

= g'^+'E{b{S)\ro{S) = x) + Op{g'^+' + n'^^') 

since the function E(6(S')|ro(5) = x) is continuous with respect to x because 
of Assumptions 5 and 6. Explicitly, we have 

E{b{S)\ro{S)=x) 

_ J b{s-p, ^{s-p, x))fs{s-p, ip{s-p, x)) ds_pip{s-p, x) ds-p 
f fs{s-p, ^{s-p, x)) ds_pip{s-p, x) ds-p 

Next, consider the term Ayi(a;). Note that for x £ we have that 

1 

(A.26) KA{x) = ——Y,i,n{x,S,)Cj 

njR{x) ^ 

with 

V'n(a;,s)= / {Kh{ro{u) - x)eiM~g{u)fi{{s - u)/g)Lg{s - u))fs{u)du 
Jls 

= j Kh{rQ{u) - x)L*^ g{s,u- s)du, 

where L*^g{s,t) = fs{s — t)eiM~^{s — t)fi{t/g)Lg{t). Define Ig^ as the set 
that contains all s € Is that do not lie in a ^'-neighborhood of the boundary 
of Is- Uniformly over s £ Ig^, we have that Mn^q{s) — fs{s)Mg = 0(g) . Thus 
for s G Ig^, we have that tpnix, s) = (1 + 0{g))tp{x, s) where the function ip 
is equal to ■iIj{x,s) = J Kii{ro{u) — x)L*{u — s)du with modified kernel L* 
defined as 

(A.27) L*{t) = eiMq^ii{t)L{t). 

Note that L* is the equivalent kernel of the local polynomial regression esti- 
mator; see Fan and Gijbels (1996), Section 3.2.2. For (7 = 0, 1 the equivalent 
kernel is in fact equal to the original one, whereas L*{t) is equal to L{t) 
times a polynomial in t of order q for q>2, with coefficients such that its 
moments up to the order q are equal to zero. The kernel L'^ g{u,t) has the 
same moment conditions in t as L* but depends on u. 

We now derive explicit expressions for the leading term in equation (A.26) 
for the cases (a)-(c) of the proposition. Starting with case (a), in which 
g/h ^ 0, it follows by substitution and Taylor expansion arguments that 
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with K'f^iv) = h-^K'{h-^v) and K',l{v) = h~^K"{h~^v) 
'4)n{x,v)= I Khirois) -x)L'^ g{s,s-v)ds 

KhiMv-tg) -x)L*^{v-tg,t)dt 

ro{v-tg) -ro(v) 



Kh{ro{v) - x) + K'Mv) - x) 



h 

2 



, j^/// ^1 f ro{v - tg) - ro{v) 



2 V h 

X L^{v - tg,t)dt 
= Kh{ro{v) - x) 

+ Kl{r,{v)-x) I (^-dsroiv)^-^ + d]roU2f^y:{v-tg,t)dt 

where xi; X2 and X3 are intermediate values between ro(v) and rQ^v — tg), v 
and v — tg, and v and v — tg, respectively. This gives an expansion for ipni^, v) 
of order (g/h)^. For v ^ Ig^ one gets an expansion of order g/h. Put kn{v) = 

—dsrQ{v) J tL^{v — tg,t)dt. Together with Lemma 5 in Appendix A. 7, we 
thus obtain that 

1 " 

= ^Tf^ ^ (Kh{ro{S^) -x) + ^K{ro{S,) - x)fc„(5,)) 



hi \ nh 



log(n) 



nfji{x)^^ \\h^\h^J\ nh I 

as claimed. To show statement (b) of the proposition, we rewrite the func- 
tion ^lJn as follows: 



'4'n{x,v) 



I (^Kh{ro{v)-x + dsro{v)th)+K'(^^y',ro{x2)lt^^ 
X L* (u — th,t) dt 
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= Jn,hix,v) + h j K'f^{xi)dlrQ{x2)\t'K{v-th,t)dt, 

where Jn,h{x-,s) = f Kh{rQ{s) — x — dsrQ{s)uh)L^{s — uh,u)du, and xi is 
an intermediate value between rQ{v + gt) and ro{v) +dsro{v)tg, and X2 is 
an intermediate value between v and v + gt. As in the proof of part (a), it 
follows from Lemma 5 in Appendix A. 7 that 



^ n 1 " 



1 " / 



log(n) 



where Jh uses the location independent form of the equivalent kernel L* as 
defined in the text in front of Proposition 1. This implies the desired result. 

Now consider statement (c) of the proposition. In this case, where g/h^ 
oo, we can rewrite the function tpn as follows: 

1pn{x,v) = I Kh{Wp - x) 



rp rp 

xLlg{{w-p,(p{w)) ,{w_p-v-p,ip{w) -Vp) )dxip{w)dw. 

From tedious but conceptually simple Taylor expansion arguments similar 
to the ones employed for case (a), and from Lemma 5, one gets that 



^ n 1 fj 

^ E S,)C, = g + Op 



h"^ /log(n) 
where 

Hn,g{x, = j K{t)L*^ g{{v^p + gs^p, Gniv-p, X] S_p, t)), 

(A.28) {s-p,Gn{v-p,x;s-p,t) - Vp)) 

dx(p{v-p,x) ds-pdt 

and Gn{v-p, x; S-p,t) = ip{v-p, x) +gs-p d-pip{v-p, x) + htdx(p{v-p, x). With 
as defined in the text, we find 

^ n 1 

nfR{x) ^ nfR{x) ^ 



Since 0{h/g) = o(l), this completes our proof. 
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A. 3. Proof of Proposition 2. To show the result, note that 

r{x, r) = ejNh{x)-'n(.Kh{r{S) - x) - Kh{ro{S) - x))w{x)p{S)) 

+ Op(n-«V2){l-r,+ )+25-.,)) 

= E(p(5)|r(5) = x)- E(p(5)|ro(5) = x) 
+ Op(n-2'? + n-«i/2)(i-,+ )+25-^,)) 

uniformly over x G Ir and r G Ain- Since E(/3(S')|ro(5')) = by construc- 
tion, it suffices to consider the term 'E{p{S)\r{S) =x). To simplify the ex- 
position, we strengthen Assumption 6 and suppose that in addition to ro 
all functions r G are strictly monotone with respect to their last ar- 
gument, and write (pr for corresponding the inverse function that satisfies 
r{u-p,ipriu-p,x)) =x (without this condition, the notation would be much 
more involved, as we would have to consider all regions where the func- 
tions r £ Mn are piecewise monotone with respect to the last component 
separately). Using rules for integrals on manifolds, we derive the following 
explicit expression for E,{p{S)\r{S) = x): 

E{p{S)\r{S) = x) 

_ J P{s-p, Pr{S-p, x))fs{S-p, iprjs-p, x)) d-piprjs^p, x) ds^p 
f fs{S-p,(pr{S-p,x))d-.pipriS-p,x)dS-p 

Set the numerator of the above expression as 7i(x,r) and the denomina- 
tor as 'y2{x,r). Then clearly 72(2;, f) = fR{x) + Op(l) uniformly over x G Ir. 
Moreover, note that the mapping 

r ^ p{S-p, ipr{S-p, x))fs{S-p, (pr{S-p, x)) 

is Hadamard differentiable at rg, with derivative 

^ dpX{s.p,^{s.p,x)) ^ ^ 
dpro{s-p,(p{s-p,x)) ^' ^' 

It follows with 7i(x,ro) = that 

= / 1^1^^^^^"^'-^' - 
X {d^pipr{s-p,x))ds-p 
+ Op{\\r-ro\\l). 

We evaluate the term 7i(x,r), substitute the uniform expansion (A. 24) for 
f(s) — ro(s) into the explicit expression derived above, and use standard 
arguments from kernel smoothing theory. This gives the desired expansion 
for Ta- The form of follows from the same arguments used to derive the 
form of in the proof of Proposition 1. 
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A. 4. Proofs of Corollaries 1-4. The statements of these corollaries follow 
by direct application of Proposition 1-2 and Theorem 1. The statement 
of Corollary 1 is immediate. For Corollaries 2-4, we only have to check 
that the error bounds in Theorem 1 and Proposition 1-2 are of the desired 
order. We only discuss how the constants a, 5 and ^ can be chosen. Note 
that all these constants have no subindex because we only consider the 
case d = 1. We apply Theorem 1 conditionally on the values of 5*1, . . . , 5„. 
Then the only randomness in the pilot estimation comes from , . . . , ^„ . 
We can decompose f into + ^b, where va is the local polynomial fit to 
{Si,(i), and rs is the local polynomial fit to (S'j, ro(S'i)). Conditionally given 
Si,. . . ,Sn, the value of is fixed, and for checking Assumption 3, we only 
have to consider entropy conditions for sets of possible outcomes of "ta- We 
will show that with a = p/k one can choose for 6 and ^ any value that is 
larger than (1 —p6)/2 or —pk~^(l — p6)/2 + p6, respectively. Note that then 
a <2 because of Assumption 4(iii). It can be easily checked that we get the 
desired expansions in Corollaries 1 and 2 with this choices of a = p/k, 5 
and ^ (with 5 and ^ small enough). In particular note that we can make 
(5a + ^ as close to p6 as we like. 

It is clear that Assumption 2 holds for this choice of 6. This follows by 
standard smoothing theory for local polynomials. Compare also Lemma 5 
and the proof of Proposition 1. It remains to check Assumption 3. It suffices 
to check the entropy conditions for the tuple of functions (n~^ Y17=i ^h{Si — 
s)[{Si — s)/gYC,i : < 7r+ < (7, vTj > for j = 1, . . . This follows because 
we get rA by multiplying this tuple of functions with a (stochastically) 
bounded vector. We now argue that all derivatives of order k of the functions 
Y^^=i Lh{Si — s) [{Si — s)/gYC,i can be bounded by a variable that ful- 
fills Bn ^bn = n^**) with probability tending to one. Here is a number 
with ^** > — ^(1 —pd) + kd. This bound holds uniformly in s and vr. Further- 
more, the functions "127=1 -^hiSi — s)[{Si — s)/g]^Ci can be bounded by 
a variable An that fulfills <an = n^*) with probability tending to one. 
Here ^* is a number with ^* > —^(1 —pd). Again, this bound holds uniformly 
in s and vr. We now consider the set of functions on 1$ that are absolutely 
bounded by a„ and that have all partial derivatives of order k absolutely 
bounded by 6„. We argue that this set can be covered by Cexp(A~^/*'6n'''^) 
balls with || • ||oo-radius A for A < a„. Here the constant C does not depend 
on an and 6„ • This entropy bound shows that Assumption 3 holds with these 
choices of a, 5 and ^. For the proof of the entropy bound one applies an en- 
tropy bound for the set of functions on Is that are absolutely bounded by 
1 and that have all partial derivatives of order k absolutely bounded by 1. 
This set can be covered by Cexp(A~P/'^) balls with || • ||oo-radius A for A < 1. 
The desired entropy bound follows by rescaling of the functions. Note that 
we have that b~^an — )■ 0. 
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A. 5. Proof of Corollary 5. Our proof has the same structure as the one 
provided by Lewbel and Linton (2002), but making use of Theorem 1 con- 
siderably simphfies some of their arguments. First, note that the restric- 
tion that 6<e<e imphes that (ngPy^^h'^ and {ngPy/^gi+'^ 0. From 
a second-order Taylor expansion, we furthermore obtain that 

fi{x) - noix) = ^ .A r{x) - ro{x)) 



, Q{s)-qo{s) q'{r{x)) 



2 



r(x) q{s)qo{s. 



i2 



g r x -go r(x % 
q{r[x))qo{r{x)) 
= Ti+T2 + T3 + Ti + T5, 

where f(x) and f(a;) are intermediate values between r{x) and r{x). Now it 
follows from standard arguments for local linear estimators that 

a^{x) 




since sq{x) =qQ{rQ{x)). To prove the corollary, it thus only remains to be 
shown that the remaining four terms in the above expansion are of smaller 
order than Ti . Under the conditions of the corollary, it is easy to show with 
straightforward rough arguments that inf > 0, supg'(s) = Op(l) and 
sup|g(s) — go(s)P = Op((n(/^)~^/^) where the supremum and infimum are 
taken over s G (ro(x) — e, Aq + e) for some e > 0, respectively. This directly 
implies that T3 -|- T4 -|- Ts = Op{(ng'P)~^/'^). Now consider the term T2. From 
Theorem 1, we obtain that 

ro{x) qo[sY y^^(^) go(s)^ 

where q{x) is the oracle estimator of the function q obtained via local linear 
regression of 1{Y > 0} on ro(X), and A(s) and V{x) are the adjustment 
terms that appear in the main expansion in Theorem 1, with the necessary 
adjustments to the notation. Using similar arguments as in the proof of 
Proposition 1-2 and Corollaries 2-4, and the restriction that 9< < 9, we 
obtain that 

r(x) ris) /ij(ro(Xi)) 

Oj,{n-'/^)+Op{h')=Op{ingP)-'/') 
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for Ei = I{Yi > 0} — qo{Xi), and similarly that 

Jrix) Qoisr \ngP J 

Thus T2 = Op{{ngP)~^^'^). Finally, straightforward calculations show that 6 < 
9 <9 also implies that Op{n~'^) = Op{{ngP)~^^'^). This completes the proof. 

A. 6. Proof of Corollary 6. Let / = {ih,fi2) and / = (m,/i2), define the 
functional Sn{f) as 

1 " 

Sn{f) = - y^fl{xi,Zi,Xii - f2{Zi)) - ni{xi,zi), 
i=l 

and let = linit^Q{Sn{f + th) — Sn{f))/t denote its directional deriva- 

tive. One then obtains through direct calculations that for any / = (/i,a + 
/i,B)/2) with bounded second derivatives we have that 

\\Snif)-Snif)-Snif)[f-f]\\o. 

= 0(11/2 - /2IIL) + o{\\h - /2II00II/S - fV\U + o(II/i,bIIoc), 

where f^^'^{xi, zi,v) = di,fi^A{xi,zi,v). Using the same kind of arguments as 
in the proof of Proposition 1, under the conditions of the corollary one can 
derive the following stochastic expansion of m up to order Op{{nh^'^'^'^)~^^'^) , 
uniformly over {xi,zi,v) in the /i-interior of the support of {Xi, Zi,V): 

m(xi, zi, f ) — m{xi, zi,v) 
1 

(A.29) = -— KhiiXu, Zu, Vi) - {xi,zuv))ei 

''T'lRyXi, zi,v) 

+ Op{{nh'+'^r'/\ 

where £i = Y — m{Xii, Zii,Vi). A similar, but notationally more involved 
expansion can be derived for values of {xi,zi,v) in the proximity of the 
boundary. Note that since exclusion restriction on the instruments that 
E{U\Zi,Z2,V) = E{U\V) implies that E(e|Zi, Zg, F) = 0. In the notation 
of Theorem 1, this means that p{s) = 0, and hence the term corresponding 
to r(x) is equal to zero and does not need to be considered. 

Now let fi^A denote the sum of the function m and the leading term of the 
expansion (A.29), and denote the remainder term by /i,_b. Then it follows 
from, for example, Masry (1996) and the conditions on rj and 0, that 

11/2 - /2II00 = Op((log(n)/(n/^+'^2))i/2) = Opi{nh'+''-)-'/% 
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and it follows from the same result together with Lemma 5 in Appendix A. 7 
that 

11/2 - /2II00II/S - fVWoo = Op(log(n)/(n2/i3+'^^/i+'^^)V2) 

= o,((n/ii+'^^)-V2). 
For any fixed values {xi,zi) we thus have that 

Ml (Xi , Zi ) - /ii (Xi , Zi ) = Sn if) = Sn if) + Ti,„ + T2,n + Op ( (n/l^+'^l ) "^/^ ) , 

where 

1 

Tl,n = —y2m^^\xi,Zi,Vi){fi2{Zi) - IJi2{Z,)), 

1 " 

- V'(m(xi,zi, Vi) -m(xi, 21,1/4)). 



n 



Being a simple sample average of i.i.d. mean zero random variables, one can 
directly see that Snifo) = Op{n~^/'^) = Op((n/i^+'^i)~^/^). Using a stochas- 
tic expansion for jl2 as in the proof of Proposition 1, and applying pro- 
jection arguments for U-statistics, one also finds that Ti^„ = Op(n~^/^) = 
Op((n/i^"*"'^i)~^/^). Now consider the term T2^n- From the expansion in (A. 29), 
it follows that for any fixed values (a;i,zi) we have that 

^ n ^ n 

T2,n = -Y.—r-, —Y,Kh{{Xu,Zu,V{) - {xi,zi,Vj))e, 

(A.30) 

+ Op((n/ii+'^i)-i/2). 
This in turn implies that 

V^T2,^An(o,e(-^^^^^) jkitfdt) 

V \fxz^\v{xi^zi,V) J J J 
using again projection arguments for U-statistics. 

A. 7. Uniform rates for generalized kernels. The following auxiliary lemma 
states uniform rates for averages of i.i.d. mean zero random variables weighted 
by "kernel-type" expressions. It is used in the proofs of several of our results. 
Modifications of the lemma are well known in the smoothing literature; see, 
for example, Hardle, Janssen and Serfling (1988). The lemma can be proved 
by standard smoothing arguments. One can proceed by using a Markov in- 
equality as in the proof of Lemma 1, but without making use of a chaining 
argument. 

Lemma 5. Assume that D C M*^"^ is a compact set, and Wn^h is a kernel- 
type function that satisfies Wn,h{u,z) = for \\u — t{z)\\ > bnh for some 
deterministic sequence <b < \bn\ < B < 00, and t iM*^^ — )• a continu- 
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ously differentiable function, for any u € D and z G M . Furthermore, as- 
sume that \Wn,h{u,z) -Wn,h{v,z)\ < With SUp^Wn 

bounded, and that Mlexp {p\e\)\S] < C a.s. for a constant C > and p> 
small enough. Then we have that 



sup 



1 



n 



i=l 



'log(n) 



n 



for any deterministic sequence a„ with |a„| <A. 
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